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Abstract 

Practical applications of nonparametric density estimators in more 
than three dimensions suffer a great deal from the well-known curse 
of dimensionality: convergence slows down as dimension increases. 

We show that one can evade the curse of dimensionality by assuming 
a simplihed vine copula model for the dependence between variables. 

We formulate a general nonparametric estimator for such a model and 
show under high-level assumptions that the speed of convergence is 
independent of dimension. We further discuss a particular implemen¬ 
tation for which we validate the high-level assumptions and establish 
its asymptotic normality. Simulation experiments illustrate a large 
gain in hnite sample performance when the simplifying assumption 
is at least approximately true. But even when it is severely violated, 
the vine copula based approach proves advantageous as soon as more 
than a few variables are involved. Lastly, we give an application of 
the estimator to a classihcation problem from astrophysics. 

Keywords: Classification, copula, dependence, kernel density estima¬ 
tion, pair-copula construction, vine copula 

1. Introduction 

Density estimation is one of the most important problems in nonparametric statis¬ 
tics. Most commonly, nonparametric density estimators are used for exploratory 
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data analysis, but find many further applications in fields such as astrophysics, 
forensics, or biology [7, 4, 32], Many of these applications involve the estimation 
of multivariate densities. However, most applications so far focus on two- or 
three-dimensional problems. Furthermore, the persistent interest amongst practi¬ 
tioners is contrasted by a falling tide of methodological contributions in the last 
two decades. 

A probable reason is the prevalence of the curse of dimensionality, due to 
sparseness of the data, nonparametric density estimators converge more slowly 
to the true density as dimension increases. Put differently, the number of ob¬ 
servations required for sufficiently accurate estimates grows excessively with the 
dimension. As a result, there is very little benefit from the ever-growing sample 
sizes in modern data. Section 7.2 in [44] illustrates this phenomenon for a kernel 
density estimator when the standard Gaussian is the target density: to achieve 
an accuracy comparable to n = 50 observations in one dimension, more then 
n = 10® observations are required in ten dimensions. 

In general, this issue cannot be solved: Stone [48] proved that any estimator / 
that is consistent for the class of p times continuously differentiable d-dimensional 
density functions converges at a rate of at most More precisely, 

f{x) = f{x) + Op{n-^), 

for all densities / of this class and some r > 0, implies that r < p/{2p -|- d). The 
curse of dimensionality manifests itself in the d in the denominator. It implies that 
the optimal convergence rate necessarily decreases in higher dimensions. Thus, 
to evade the curse of dimensionality, all we can hope for is to find subclasses of 
densities for which the optimal convergence rate does not depend on d. One such 
subclass is the density functions corresponding to independent variables, which 
can be estimated as a simple product of univariate density estimates. But the 
independence assumption is very restrictive. We also want the subclass to be 
rich and flexible. We will show that simplified vine densities are such a class and 
provide a useful approximation even when the simplifying assumption is severely 
violated. 

1.1. Nonparametric density estimation based on simplified 
vine copulas 

We introduce a nonparametric density estimator whose convergence speed is 
independent of the dimension. The estimator is build on the foundation of a 
simplified vine copula model, where the joint density is decomposed into a product 
of marginal densities and bivariate copula densities, see, e.g., [12] and Section 3.9 
in [29]. 

First, we separate the marginal densities and the copula density (which captures 
the dependence between variables). Let (Xi,..., Xd) G be a random vector 
with joint distribution F and marginal distributions Fi,... Fd- Provided densities 
exist, Sklar’s Theorem [45] allows us to rewrite the joint density / as the product 
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of a copula density c and the marginal densities fi,..., fy. for all x G 

f{x) = c{Fi{xi),.. .,Fd{xd)} X fi{xi) X • • • X fd{xd), 

where c is the density of the random vector (Fi(Xi),..., Fd{Xd)) G [0,1]'^. In 
order to estimate the joint density /, we can therefore obtain estimates of the 
marginal densities fi,...,fd and the copula density c separately, and then plug 
them into the above formula. With respect to the curse of dimensionality, nothing 
is gained (so far) since estimation of the copula density is still a d-dimensional 
problem. 

A crucial insight is that any d-dimensional copula density can be decomposed 
into a product of d{d — l)/2 bivariate (conditional) copula densities [5]. Equiva¬ 
lently, one can build arbitrary d-dimensional copula densities by using d{d— l)/2 
building blocks (so-called pair-copulas). Following this idea, the flexible class of 
vine copula models — also known as pair-copula-constructions (PCCs) — were 
introduced in [1] and have seen rapidly increasing interest in recent years. For 
instance, a three-dimensional joint density can be decomposed as 

f{Xi,X2,X3) = Ci^2{Fi{Xi),F2{x2)} X C 2,3 {F 2 (a; 2 ), F 3 (xs)} 

X ci,3;2{Fi|2(a;i|x2),F3|2(a;3|a;2); X 2 } 

X /i(a;i) X f2(x2) X fsix^), 

where Ci^ 3 ; 2 {Fi| 2 (a;i|a; 2 ), F 3 | 2 (x 3 |a: 2 ); X 2 } is the joint density corresponding to the 
conditional random vector (Fi| 2 (Xi|X 2 ), F 3 | 2 (X 3 |X 2 )) |X 2 = X 2 . Note that the 
copula of the vector depends on the value X 2 of the conditioning variable X 2 . 
To reduce the complexity of the model, it is usually assumed that the influence 
of the conditioning variable on the copula can be ignored. In this case, the 
conditional density Ci^ 3;2 collapses to an unconditional — and most importantly, 
two-dimensional — object, and one speaks of the simplifying assumption or a 
simplified vine copula model/PCC. For general dimension d, a similar decomposi¬ 
tion into the product of d marginal densities and d{d — l)/2 pair-copula densities 
holds. 

Some copula classes where the simplifying assumption is satished are given 
in [47]. An important special case is the Gaussian copula. It is the dependence 
structure underlying a multivariate Gaussian distribution and can be fully charac¬ 
terized by d{d — l)/2 partial correlations. Note that under a multivariate Gaussian 
model, conditional correlations and partial correlations coincide. This property 
is in direct correspondence to the simplifying assumption which states that all 
conditional copulas collapse to partial copulas. When the Gaussian copula is 
represented as a vine copula, it consists of d{d — l)/2 Gaussian pair-copulas 
where the copula parameter of each pair corresponds to the associated partial 
correlation. In a general simplihed vine copula model, we replace each Gaussian 
pair-copula by an arbitrary bivariate copula. Such models are extremely flexible 
and encompass a wide range of dependence structures. The class of simplihed 
vine distributions is even more hexible, because it allows to couple a simplihed 
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vine copula model with arbitrary marginal distributions. 

Under the simplifying assumption, a d-dimensional copula density can be 
decomposed into d{d — l)/2 unconditional bivariate densities. Consequently, the 
estimation of a d-dimensional copula density can subdivided into the estimation 
of d{d — l)/2 two-dimensional copula densities. Intuitively, we expect that 
the convergence rate of such an estimator will be equal to the rate of a two- 
dimensional estimator and, thus, there is no curse of dimensionality. This is 
formally established by our main result: Theorem 1. 

Nonparametric estimation of simplified vine copula densities has been discussed 
earlier using kernels [34] and smoothing splines [30]. However, both contributions 
lack an analysis of the asymptotic behavior of the estimators. We treat the 
more general setting of densities with arbitrary support. Theorem 1 shows under 
high-level conditions that the convergence rate of a nonparametric estimator of a 
simplihed vine density is independent of the dimension — an extremely powerful 
property that has been overlooked so far. 

1.2. Organization 

The remainder is structured as follows: Section 2 gives a review of vine copulas 
and introduces notation. A general nonparametric estimator of simplihed vine 
densities is described in detail in Section 3. In Section 4 we show under high-level 
assumptions that such an estimator is consistent and that the convergence rate 
is independent of the dimension. Hence, there is no curse of dimensionality. In 
Section 5 we discuss how the method can be implemented as a kernel estimator. 
For this particular implementation, we validate the high-level assumptions of 
Theorem 1 and establish asymptotic normality. We illustrate its advantages via 
simulations in the simplihed as well as non-simplihed setting (Section 6). The 
method is applied to a classihcation problem from astrophysics in Section 7. 
We conclude with a discussion of our results and provide links to the existing 
literature on the simplifying assumption in Section 8. 

2. Simplified vine copulas and distributions 

We will briehy recall the most important facts about vine copulas and the closely 
related vine distributions. For a more extensive introduction we refer to [1, 12] 
and Chapter 3 of [29]. 

Vine copula models follow the idea of Joe [28] that any d-dimensional copula can 
be expressed in terms of d{d — l)/2 bivariate (conditional) copulas. Because such 
a decomposition is not unique, [6] introduces a graphical method to organize the 
structure of a d-dimensional vine copula in terms of linked trees = (Un, Em), 
m = 1,..., d — 1. A sequence V := (Ti,..., Td-i) of trees is called a regular vine 
(R-vine) tree sequence on d elements if the following conditions are satished: 

(i) Ti is a tree with nodes Vi = {1,..., d} and edges Ei. 
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Ti T2 T3 T4 




Figure 1: Example of a regular vine tree sequence. 

(ii) For m > 2, Tm is a tree with nodes Vm = Em-i and edges Em- 

(iii) [Proximity condition) Whenever two nodes in T^+i are joined by an edge, 
the corresponding edges in Tm must share a common node. 

The tree sequence is also called the structure of the vine. An example of an 
R-vine tree sequence for d = 5 is given in Figure 1. For the annotation of the 
edges in each tree we follow [12]. 

An R-vine copula model identifies each edge of the trees with a bivariate copula 
(a so-called pair-copula). Assume that each pair-copula admits a density and let 
Em, 1 < m < d — 1} he the set of copula densities associated 
with the edges in V. Then, the R-vine copula density can be written as 

d-l 

ciu )=n n e-,D,{Gj^\D,[UjJuDj, Gk,\D,[UkJUDj; Ud,}, (1) 

m=l eeEm 

where := is a subvector oi u = [ui, ..., Ud) G [0,1]'^ and Gj^\D^ is the 

conditional distribution of UjJ\U£)^ = The set Df. is called conditioning set 
and the indices j^, k^, form the conditioned set. In the first tree the conditioning 
set Dg is empty, and we dehne Gj^[uj^) := Uj^,Gk^[uk^) ■= Uk^ for notational 
consistency. For a given edge e, the function Cj^^ke;De is the copula density 
associated with the conditional random vector 


(Gj,|D.(£/,.|VD,),Gfc|D,(t/,.|C/D.))|C/D. = un,. 

Note that in (1), the pair-copula density cj^^k^-D^ takes as an argument and 
the functional form w.r.t. the arguments Uj^,Uk^ may be different for each value of 
ujj^. This conditional structure makes the model very complex and complicates 
estimation. To simplify matters, we assume that this dependence can be ignored 
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and the copula is equal across all possible values of uog. we assume that the 
simplifying assumption holds. In this case, (1) collapses to 

d-l 

c(u )=n n ‘ e-,D,{Gj^\D,{UjJuD,), Gk,\D,{UkJUD,)}. ( 2 ) 

m=l eeEm 

A distribution whose copula density can be represented this way is called a 
simplified vine distribution. 

Example 1. The density of a simplified R-vine copula corresponding to the tree 
seguence in Figure 1 is 


c(Mi, . . . , M 5 ) = Ci^2('Wl, M 2 ) X Ci^ 3 (Mi, M 3 ) X C 3 ^ 4 (m 3 , M 4 ) X M 5 ) 

X C2,3;i(m2|1, M3|i) X Ci4;3(Mi|3, M 4 I 3 ) X Ci,5;3(Mi|3, M 5 I 3 ) 

X C2,4;1,3(m2|1,3; M4|i^ 3) X C4_5;i^3 (m4|i^ 3, M5|i^3) 

X C2,5;1,3,4(m2|1,3,4, M5|i^3^4), 

where we used the abbreviation := Gj^\D^{ujJu£)J. 

R-vine copula densities involve conditional distributions Gj^\£)^. We can express 
them in terms of conditional distributions corresponding to bivariate copulas in 
B as follows: Let ie G be another index such that Cj^/^;De\£e ^ ^ dehne 
D'^ := De\ le- Then, we can write 

^ie|r?e(Miel'“De) = hj^\£^.Di{Gj^\D'fiUjJUD>J | |£,/ | J } , (3) 

where the h-function is defined as 

PU 

hj,\e.-,Dfiu\v):= Cj^^i^,Dfis,v)ds, for (w, n) e [0,1]^. (4) 

Jo 

By dehnition, h-functions are conditional distribution functions for pairs of 
marginally uniformly distributed random variables with joint density 
The arguments Gj^\DfiujJu£)/J and of the h-function in (3) can 

be rewritten in the same manner. In each step of this recursion the conditioning 
set Dg is reduced by one element. Note also that, by construction, the copula 
density on the right hand side of (4) always belongs to the set B. Eventually, this 
allows us to write any of the conditional distributions Gj^\£)^ as a recursion over 
h-functions that are directly linked to the pair-copula densities. Later, we will 
use this fact to derive estimates of such conditional distributions from estimates 
of the pair-copula densities in lower trees. 

Example 2. Consider an R-vine copula corresponding to the R-vine tree se¬ 
guence given in Figure 1. We have 

^311,2(^31^1, M 2 ) = h3|2;l{h3|i(M3|Mi)|/l2|l(M2|Mi)}, 
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where /i 3 | 2 ;i(m 3 |i|m 2 |i) = 02,3,1 s)ds, /i 3 |i(m 3 |mi) = /o“"ci,3(Mi,s)rfs, and 

h2\l{u2\ui) = Ci,2{ui,s)ds. 


Altogether, we can express any vine copnla density in terms of bivariate copnla 
densities and corresponding h-fnnctions. 

3. A nonparametric density estimator based on 
simplifed vine copulas 

We propose a mnltivariate nonparametric density estimation techniqne where a) 
we separate the estimation of marginal and copnla densities, and b) the copnla 
density is estimated as the prodnct of seqnentially estimated pair-copnla densities. 
We snggest a general step-wise estimation algorithm withont specifying exactly 
how the components are estimated. This more practical issue is deferred to 
Section 5. 

Let X = (Xi,... ,Xd) G fix be a random vector with continuous joint dis¬ 
tribution F and marginal distributions Fi,, Fd. The support of Xi will be 
denoted as fix^, £ = 1,..., d. Let further ..., xjp), i = 1,..., n, be 

iid copies of X (acting as observations). Assume that T is a simplihed vine dis¬ 
tribution with structure V = (Ti,... ,Td-i). Provided densities exist, we can use 
Sklar’s theorem and (2) to write the joint density / for all a; = (a:i,..., Xd) G fix 
as 


d 

f{x) = c{Fi(xi),. . .,Fd{xd)} X Y[Mxe) 

1 = 1 

d—1 d 

=n n ‘ | De I *£>6 ) } X fiiXi). (5) 

m=l eGEm 1=1 

The conditional distribution functions Tfc^|De(^fcel*n>e) can equivalently be ex¬ 
pressed as Gk,\D,{ukJuDj, where u = {ui, ... ,Ud) '.= (Ti(xi),... ,Fd{xd)). This 
allows us to decompose Fk^\D^ recursively into h-functions (see Section 2). 

The idea is now to estimate all functions in the above expression separately. We 
use a step-wise estimation procedure that is widely used in vine copula models, 
see, e.g., [1, 25]. It is summarized in Algorithm 1. Let us describe the reasoning 
behind the hrst few steps in a little more detail. 

1. Based on the observations (xj*\ ..., z = 1,..., n, we obtain estimates 
fi,... , fd, Fi,..., Fd of the marginal densities fi,...,fd and distribution 
functions Fi, ..., Fd. 

2. The copula density c is the density of the random vector U := (Fi (Xi),..., Fd{Xd)). 
We do not have access to observations from this vector. However, we can 
dehne pseudo-observations := {Ui \ ..., by replacing Fi, ... ,Fd 
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with the estimators from the last step: 

(uf\...,uf)-(F,(xf'),...,h(xf)), i = l,...,n. (6) 

Based on two-dimensional snbvectors of the pseudo-observations (6), we 
estimate all pair-copula densities and h-functions that correspond to edges 
of the first tree (the conditioning sets Df, are empty). We use (4) to derive 
estimates of the h-functions, that is 

nu 

hj,\kS'^\v) ■.= j Cj^,k,{s,v)ds, for (M,n) e (0,1)^. 

Jo 

Optionally, the h-functions can be estimated separately. However, this will 
typically lead to a density estimate that does not integrate to one. 

3. Any pair-copula density corresponding to an edge in the second 

tree is the density of a random vector , 

e E E 2 . They are not observable, but we can use pseudo-observations such 
as 


U 


(i) 


je\^t 


■= -Fj-iabT i-’ft!)=G;.iD.(y:’ic/i,h= h.jo.idhci), 


i = 1 ,... ,n, instead. This allows us to obtain estimates Cj^^ke-,De: hj^\ke\D^i 
and hkg\j^-^De- 

4. For estimation in the third tree, we need observations from random vectors 
such as 


U 


(i) 


je\De 




-(*)' 


( 7 ) 


i = 1,... ,n, e E E^. Recall from Section 2 that, by construction, we can 
find some edge e' G E2 such that je' = je and De' U ke' = D^. Consequently, 
we can apply (3) and approximate (7) by the pseudo-observations 


fjd) _ fjd) 

^ie|£>e “ ^i(e')|D(e')Ufe(e') 




AI ( RjD . I d.! ID./ ) ’ 


Kd 


where the last equality is again derived from (3). 


5. For higher trees, proceed as in step 4. 


At the end of the procedure we have estimates for all marginal distributions/densities, 
bivariate copula densities, and all h-functions that are required to evaluate the 
R-vine density (5). For all x E fix we now dehne an estimate of the simplihed 
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Algorithm 1 Sequential estimation of simplified vine densities 

Input: Observations ..., i = 1,... ,n, structure V = (Ti,... ,Td-i). 

Output: Estimates of all marginal densities and distributions, pair-copula 
densities, and h-functions required to evaluate the simplified vine density (5). 


for f = 1 ,... ,d: 


Obtain estimates 
Set Uf := 

end for 

for m = 1,..., d — 1: 
for all e G Em- 


fg, Eg of the marginal density fg and distribution Eg. 

),i = 


(i) Estimation step: Based on obtain an estimate 

of the copula density which we denote as and 

corresponding h-function estimates hj^\ke-,D^i hk^\j^-De- 


(ii) Transformation step: Set 


U 


(b 


je I -DeU/Cg 


U, 


(b 


end for 
end for 


he I -DeUje 


hj,\ke-,De{Ul 

hks\ji,;De {Uf 


(b 

je\De 

(^) 

ke\De 


y‘:h). *=i.. 


,n. 


vine density / as 


d-l 


/vme(^) • n n y* <,;De{Fje\DA^jJxDj, Ek^iDei^kel^nJ} X ]^/£(a;£). (8) 


m = l e£Err 


1=1 


4. Asymptotic theory 

We now establish weak consistency of the simplihed vine density estimator pro¬ 
posed in Section 3. We furthermore show that its probabilistic convergence rate 
does not increase with dimension and, hence, there is no curse of dimensionality. 

4.1. Consistency and rate of convergence 

The sequential nature of the proposed estimator complicates its analysis. Esti¬ 
mation errors will propagate from one tree to the next and affect the estimation 
in higher trees. We impose high-level assumptions on the uni- and bivariate 
estimators that allow us to establish our main result. 

The first assumption considers the consistency of univariate density and distri¬ 
bution function estimators. Although estimators may converge at different rates, 
we will formulate all assumptions w.r.t. to the same rate n~^, r > 0. This rate 
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then has to be the slowest among all estimators involved — typically the rate of 
the pair-copula density estimator. 


Assumption Al. For all i = 1,... and all G ^Xn d holds 
(a) h{xi) - fi{xe) = Op{n~''), {b) sup \Fe{xi) - Fi{xe)\ = Oa.s.{n~''). 


Next, assume we are in an ideal situation where, for each edge e G Em,rn = 
1,..., d — 1, we have access to the true (but unobservable) pair-copula samples 


U. 


(h 


:= 


f yd) I 

je\De ■ Je\X>e[ jf, I De j ’ 


Tid) _17 f yW \ 


’■(d I yd)' 


(9) 


i = 1, ... , 77 ,,. Recall that estimators are functions of the data, although this 
dependence is usually not made explicit in notation. Denote 


Cje,fce;r?e x) ■ (u, V, U- 


( 1 ) 

je\De 


jjO \ 


( 10 ) 


as the oracle pair-copula density estimator that is based on the random samples 
(9). The h-function estimators corresponding to (10) are denoted and 

hk^\j^-D^. The second assumption requires the pair-copula density and h-function 
estimators to be consistent in this ideal world. For the h-functions we need strong 
uniform consistency on compact interior subsets of [0,1]^. We further assume 
that the errors from h-function estimation vanish faster than n~'^. 


Assumption A2. For all e G Em, m = 1,..., d — 1, it holds: 
(a) for all {u,v) G (0,1)^, 

Cje,fc,;De(MW) - Cje,fc,;De(MW) = Op(?i“'')> 


(6) for every 6 G (0,0.5], 

sup l^ie|A:e;De(u|u) “ | fc, (« I u) | = Oa... (^"0 , 

(M,i))e[<5,i—(5]2 

sup 1 V|ie;De(u|u) - hk,\j,-,DA'^\v)\ = Oa.s.{n-^). 
{u,v)£[5,l—Sp 


In practice, one has to replace (9) by pseudo-observations which have to be 
estimated. Thus, we only have access to perturbed versions of the random 
variables (9). Similar to a Lipschitz condition, the last assumption ensures 
that the pair-copula and h-function estimators are not overly sensitive to such 
perturbations. Denote 


v) := Cj^,k,.D, {u, V, ..., (11) 

as the estimator based on pseudo-observations ^feline dehned in Algo- 
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rithm 1). The h-function estimators corresponding to (11) are denoted hj^ike-,De 
and 

Assumption A3. For all e G m = 1,... ,d — 1, it holds: 

(а) for all {u,v) G (0,1)^? 

v) Cj^^ke-,Dei{u,v) = Op{ae^n)i 

(б) for every 6 G (0,0.5], 


sup 

hj^\ke\Dfi'a\v) 

- hj^\k,.^DSu\v)\ 






sup 

hk^\j^-Dfiu\v) 

-hk,\j,-,DXu\v)\ 

Oa.s. (^e,n) : 






where 


ae,n ■= snp \Ul 

2=1,...,n 


(0 

je\De 


jjd) 

^je\D^ 


+ \U, 


(0 

fce|n>e 


U, 


(0 


/Ce|Z)e 


Finally, we require the true pair-copula densities to be smooth. Note that 
smoothness of pair-copula densities already guarantees smoothness of related 
h-functions by (4). 


Assumption A4. For all e G Em, m = 1,..., d — 1, the pair-copula densities 
Cj^y^-De continuously differentiable on (0,1)^. 

Now we can state our theorem. The proof is deferred to Appendix A. 


Theorem 1. Let f be a d-dimensional density corresponding to a simplified vine 
distribution with structure V = (Ti,..., Ta-i) and let ..., = 1,..., n, 

be iid observations from this density. Denote further /vine as the estimator 
resulting from Algorithm 1 with (Aj*\ ..., and V as the input. Under 

Assumptions Al-Afi it holds for all x G fix? 

Une{x) - f{x) = Op(n“''). 


Usually, convergence of nonparametric density estimators slows down as dimen¬ 
sion increases. This phenomenon is widely known as the curse of dimensionality 
and restricts the practical application of the estimators to very low-dimensional 
problems. By Theorem 1, the proposed vine copula based kernel density estimator 
inherits the convergence rate of the bivariate copula density estimator. It does not 
depend on the dimension d and, therefore, suffers no curse of dimensionality. This 
is a direct consequence of the simplifying assumption allowing us to subdivide the 
d-dimensional estimation problem into several one- and two-dimensional tasks. 

Assuming that the pair-copula densities are p times continuously differentiable. 
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we can achieve convergence with r = p/{2p + 2). Recalling from [48] that a 
general nonparametric density estimator has optimal rate p/ {2p + d), we see that 
the vine copula based estimator converges at a rate that is equivalent to the 
rate of a two-dimensional classical estimator. As this property is independent of 
dimension, we can expect large benefits of the vine copula approach especially in 
higher dimensions. We emphasize that a necessary condition for Theorem 1 to 
hold with r = p/ {2p -|- 2) is that the density / belongs to the class of simplihed 
vine densities. If this is not the case, the estimator described in Section 3 is 
not consistent, but converges towards a simplihed vine density that is merely 
an approximation of the true density. More specihcally, its limit is the partial 
vine copula approximation, hrst dehned in [46]. In Section 6 we will illustrate 
that even in this situation an estimator based on simplihed vine copulas can 
outperform the classical approach on hnite samples. 

Remark 1. Theorem 1 allows for densities f with arbitrary support. Their 
support, Vtx, only relates to the marginal distributions; copulas are always sup¬ 
ported on [0, l]'^. If some of the Xi have bounded support, we just have to use 
estimators for fi that takes this into account. This underlines how flexible the 
vine copula based approach is. 

Remark 2. It is straightforward to extend Theorem 1 to non-simplified vine 
densities by extending the pair-copula densities to functions of more than two 
variables. Besides that, the proof given in Appendix A does not make use of the 
simplifying assumption at all. However, the simplifying assumption is necessary 
forr = p/ (2p-|-2) to be feasible. More generally, if we assume that the pair-copulas 
depend on at most d' conditioning variables, the optimal rate is p/{2p + 2 + d'). 

Remark 3. Theorem 1 can be extended to 

sup \fvineix) - fix)\ = Op { (lu n/u)'’}, 

provided that the rate n~‘^ in our assumptions is replaced by {\nn/nY and holds 
uniformly on Vtxi and [0,1]^ respectively. But this requires that the pair-copula 
densities are bounded which is unusual. For example, it does not hold when f 
is a multivariate Gaussian density with non-diagonal covariance matrix. If the 
assumptions are met, /vine is able to achieve the optimal uniform rate of a two- 
dimensional nonparametric density estimator which is attained at r = p/{2p 2) 

[see, 49 ]. 

Assumptions A1-A3 are very general and hold for a large class of estimators 
under mild regularity conditions. In Section 5 we validate them for a particular 
implementation which will be used in the simulations (Section 6). 
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4.2. A note on the asymptotic distribution 

We also want to give a brief and general account of the asymptotic distribution 
of the estimator. Let d* = d-f d{d — l)/2 and f*{x) G be the stacked vector 
of all components of the product fnne{x) in Eq. (8), i.e., 

/*(*) := f2{x2), ■ ■ ■ ,...), 

and similarly f*{x). Then U.k=ifk = Kme{x) and U.k=ifk = /(*)• The 
following result is a simple application of the multivariate delta method. 

Proposition 1. If for some p,x G R'^*, '^x £ R'^*^'^*, 

{r{x) - r{x)] A (12) 

then for all x G R'^, 

nAlUx) - fix)} A Afd{e^ ^Ix,0^^x0), 

where 9k = Ujyk ffA), k = l,...,d*. 

The standard way to establish the joint normality assumption (12) is to check 
the conditions of the multivariate Lindeberg-Feller central limit theorem (see 
Proposition 2.27 of [50]). We will do this for a particular implementation in 
Section 5 (see Proposition 5). 

5. On an implementation as kernel estimator 

So far we did not specify how the marginal densities, pair-copula densities, and 
h-functions should be estimated. In general, we can tap into the full potential 
of existing methods. In this section, we discuss a particular implementation as 
a kernel estimator. We give low-level conditions under which the assumptions 
of Theorem 1 can be verihed. We present corresponding consistency results and 
establish asymptotic normality of /vine- Similar results could be obtained for 
other implementations. Another issue is that we assumed the structure of the 
vine to be known. Some heuristics to select an appropriate vine structure are 
discussed at the end of this section. 

5.1. Estimation of marginal densities and distribution 
functions 

Univariate kernel density and distribution function estimators have been exten¬ 
sively studied in the literature. To this day, they are most popular in their original 
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form [41, 38]: for all a: G R, 



where > 0 is the bandwidth parameter, K is a kernel function and J{x) = 
K[s)ds the integrated kernel. We impose the following assumptions on the 
kernel function, bandwidth sequence, and marginal distributions. 

Kl: The kernel function K is a symmetric probability density function supported 
on [—1,1] and has continuous hrst-order derivative. 

K2: The bandwidth sequence satisfies 0 and Inn —)■ cx). 

Ml: For all £ = 1 ... ,d, fi is strictly positive on R and has uniformly continuous 
second-order derivative. 

The following result gives the rate of strong uniform consistency for f^. 
Proposition 2. Under conditions Kl, K2, and Ml, the estimator (13) satisfies 
sup|/£(a;) - fi{x) \ = Oa.s.{bl + v^lnn/(n6„)). 

for all £ = 1 ... ,d. 

Proof. A standard result for kernel density estimation [see, e.g., 44, Section 6.2.1] 
is 

- 1 ( 9 ^ 

- fe{x) = + o(bl), 

where afi; = < oo by Kl and /dx^ ffix) is bounded by Ml. 

The claim then follows from Theorem 2.3 of [21] which states 

sup|^(a;) - E{^(a;)}| = Oa.*. (Vlnn/(n6„)). □ 


Proposition 2 implies pointwise weak consistency of as well as strong uniform 
consistency of Fi with the same rate. In both cases the rate could be improved, but 
the result will be sufficient for our purposes. The mean-square optimal bandwidth 
for fi is bn = for which Proposition 2 holds with rate Oa.s.ijL' -2/5^ 

nn). 

Extensions of the above estimator comprise variable bandwidth methods [42], 
transformation techniques for heavy-tailed distributions [8], and boundary kernel 
estimators that avoid bias and consistency issues on bounded support [9]. 
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5.2. Estimation of pair-copula densities 

Nonparametric estimation of copula densities requires caution because they are 
supported on the unit hypercube. An estimator that takes no account of this 
property will suffer from bias issues at the boundaries of the support. A few 
kernel estimators particularly suited for bivariate copula densities were proposed 
in the literature [19, 11, 16]. Other nonparametric estimators can be constructed 
based on Bernstein polynomials [43], B-splines [31], or wavelets [17]. 

In this paper, we will use the transformation estimator of [11]. The idea is 
to transform the data to standard normal margins (and therefore unbounded 
support) where the transformed density gets estimated by a standard kernel 
estimator. Then, this estimate is transformed back to uniform margins. Denote 
T, and 0 as the standard Gaussian cdf, quantile and density functions. For 
s G R^, let us write short K{s) = K{si)K{s2), and Kb„{s) = K{B~^s)/ det(R„) 
for some positive dehnite bandwidth matrix G R^. The transformation 
estimator is dehned via 


Cjg,ke-,De 



i=l 




/p{4 ‘('U)}.#>{'S> '(!>)}]. 

(14) 


In order to verify the high-level assumptions A2a and A3a, we need the following 
two conditions to hold for all e G i7i,..., Ed-y. 

Cl: The true pair-copula densities Cj^^ke]D^ are twice continuously differentiable 
on (0,1)^. 

C2: The transformed densities i>y,k,-,D,ix,y) = Cj^^k,-^D,{^ix),^{v)}4>{x)4>{y) 
have continuous and bounded hrst- and second-order derivatives on R^. 

Cl is a smoothness condition that is very common in nonparametric estimation. 
C2 is less standard as it relates to the transformed density. Sufficient conditions 
for C2 are given in Lemma A.l of [16] and can be verihed for many parametric 
families, including the ones used in our simulation study. 

To avoid unnecessary technicality, we will assume here that the bandwidth 
matrix is a multiple of the identity matrix: Bn = bn x f 2 - 


Proposition 3. Under conditions Kl, K2, Cl, and C2, the estimator (14) 
satisfies for all {u, v) G (0,1)^, e G i7i,..., Em, 

Cy,k,-DXu,v) - Cj^^k,,Dfiu,v) = Op{hl + y/l/inhl)), 

Cje,ke-,De{'^y x) Cj^^k„\De{x,V^ Op^tt^^n)- 

Proof. For the first equality, see Section 3.4 in [35]. For the second, see Lemma B1 
in Appendix B. □ 

When the mean-square optimal bandwidth bn = 0{n~^/^) is used, the right hand 
side of the hrst equality is 
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5.3. Estimation of h-functions 

Recall that h-functions are actually conditional distribution functions: 


hj,\k,,DM'v) = < u\Uk,\D, =v) = < u)\Uk,\D, = v]. 


The second equality relates the conditional cdf to a regression problem. Hence, any 
nonparametric regression estimator is suitable for estimation of the h-functions. 
In our case, it is even simpler to integrate the density estimate to obtain an 
estimate of the corresponding h-function: for the oracle estimators. 


pu 

K\k.-,DXu\v)-.= j Ck,,j,-,DAs,v)ds, hk,\j,;DMu) 

Jo 



Cj^^ke]De {u, s)ds, 


(15) 


and the feasible estimators hj^\ke-,D^ and hk^y^-Oe are dehned similarly. Snch 
estimators are closely related to the smoothed Nadaraya-Watson estimator of 
[22]. In fact, they coincide when we choose diagonal Bn in (14). For an explicit 
formula, see (22) in Appendix B. The following result puts this estimator in the 
context of A2b and A3b. 


Proposition 4. Under conditions Kl, K2, Cl, and C2, the estimator defined 
by (15) and (14) satisfies for all 6 G (0, 0.5], and e G Ei ,..., Em, 



snp 


hj,\k„-,Dd 

[u\ 

|u) 

- hj^\ke-^D2 

{u 

b) 

= Oa.s, 

K 

-|- ^J\nn 

'/{nbn)), 

(n. 


-5]2 








sup 


hk,\u-,Dd 

{u\ 

\v) 

- hk,\j,-D2 

{u 


= 0 ,.,. 

K 

+ ^J\nn 

'/{nbn)), 

(n. 












sup 


hj,\k,-,Dd 

{u\ 

|u) 

- hj^\k,-D,' 

[u 

\v)\ 

= Oa.s. 

K 

.n) 5 


(n. 


-5]2 











sup 


\hke\je;Dd 

{u\ 

\v) 

- hk,\j,-D,' 

[u 

b) 

= Oa.s. 

(®e, 

,n) • 


(n. 


-<5]2 








Proof. See Lemmas B2 and B3 in Appendix B. □ 

The optimal rate of convergence in the hrst two eqnalities is Oa.5.|(ln?7,/n)^/®} 
and attained for = 0{(lnn/n)^/^}. 

Assumption A2b requires that the error of estimating the h-function vanishes 
faster than the error of pair-copula density estimation. This is readily achieved 
by using the optimal bandwidth in each component. However, it may be more 
convenient to use the same bandwidth for pair-copula density as well as h-function 
estimation. It seems natnral to use the optimal rate for pair-copula density 
estimation, = 0(n“^/®). But this violates A2, because both estimators converge 
with the same rate: . To overcome this, we have to increase the speed of 

bn by a small amount, i.e., to nndersmooth the pair-copula density estimate. 
When bn = an = o(l), the pair-copula density estimators converges with 

rate and the h-fnnction estimator with rate -f = 
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But the sequence can converge arbitrarily slow. So we should 
not expect any problems with using the mean-square optimal rate in 

practice. This was conhrmed by preliminary numerical experiments. 

5.4. Asymptotic normality 

We now put all pieces together and show that the estimator /vine composed of 
(13), (14), and (15) is asymptotically normal. We start by establishing the joint 
asymptotic normality of all components. The proof is deferred to Appendix C. 

Proposition 5. Assume that 

(i) conditions Kl, Ml, Cl, and C2 hold, 

(a) fi and Fi are defined by (13) with (marginal) bandwidth parameter hn,m, 
(in) Cj^^ke-,De defined by (14) with (copula) bandwidth parameter bn,c, 

(iv) hj^\ke-,De dcfincd by (15) and (14) with (h-function) band¬ 

width parameter bn,h, 

(v) it holds bn,c = 0(n“^/®), and for sufficiently large n, 

bl,c < bn,m < bn,h < mm{bn,c, / logu}. 

Recall the definition of f*{x), f*{x), and d* from Section 4-2. R holds for all 

X G R'^, 


(nbl^cY^'^irix) - - f*{x)} 4 TVrf* (0, , (16) 

where p,x = (Oj,/i^)''', fix = {fix,e)e£Ei,...,Ed-i> is diagonal with first 

d diagonal entries equal to 0 and remaining diagonal entries {d'x,e)eeEi,...,Ea-i- 
Explicit expressions for fLx,e o,nd ax,e are given in (30) and (32) in Appendix C. 

The asymptotic normality of /vine follows by an application of the delta method. 

Corollary 1. Under the assumptions of Proposition 5 it holds for all x G 

44,c)^^44ine(a?) - bl cO^Rx - f{x)] 4 W(0, , 

where 6k = f*j{.x), k = l,...,d*, and p^x, J2x are as in Proposition 5. 


5.5. Structure selection 

Finding the optimal structure for vine copulas is extremely difficult. Because 
of the large number of possibilities, practical approaches are usually based on 
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heuristics. In few situations, expert knowledge can be used to decide which 
pair-wise dependencies should be modeled explicitly. If there is no meaning¬ 
ful prior information, the structure selection algorithm of [14] can be adopted. 
Starting with the first tree, we select the tree that is a maximum (or minimum) 
spanning tree w.r.t. some weight fnnction We assigning a weight to each pair of 
psendo-observations. The most popnlar weights are empirical estimates of Te, the 
(unconditional) Kendall’s r corresponding to They can be estimated 

seqnentially from the psendo-observations defined in Algorithm 1. The idea is to 
choose a structure that captures most of the dependence in lower trees. Other 
possible weights are the AIC or goodness-of-fit p-valnes corresponding to a pair- 
copnla estimate; see [13] for a discnssion. By nsing kernel density estimators for 
the pair-copulas, we get a fully nonparametric structure selection algorithm. 


6. Simulations 

In this section, we stndy the hnite sample behavior of a vine copula based kernel 
density estimator. We illustrate its advantages compared with the classical 
kernel density estimator in three scenarios that comprise one simplified and two 
non-simplified target densities. 

6.1. Implementation of estimators 

The stndy was carried ont in the statistical compnting environment R [39]. We 
nse the implementation of /vine introdnced in the previons section: 

Marginal densities are estimated by the standard kernel density estimator 
(13). Bandwidths are selected by the plng-in method of [10], as implemented 
in the fnnction hpi of the ks package [15]. 

Marginal distributions are estimated by integrating the estimates of the 
marginal densities. 

Pair-copula densities are estimated by the transformation estimator (14) with 
bandwidth matrix selected by the normal reference rnle; see, e.g.. Section 
3.4 in [35]. 

The vine structure is considered nnknown and selected by the method of [14] 
using empirical estimates of Tg as weight fnnction (see Section 5.5). 

The estimator /vine is implemented in the R package kdevine [36]. The package 
also includes estimators for marginals with bounded support as well as more 
sophisticated pair-copula estimators which further in^rove the performance. For 
the classical multivariate kernel density estimator (/mvkde from here on) we use 
the function kde provided by the ks package [15]. It selects the bandwidths by 
the plng-in method of [10]. 
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6.2. Performance measurement 

We evaluated the performance of both estimators for three choices of the target 
density /. To gain insight on their convergence behavior under increasing dimen¬ 
sion, we consider hve different sample sizes n = 200, 500,1 000, 2 500, 5 000, and 
three different dimensions d = 3,5, 10. For any hxed target density, sample size, 
and dimension, we measure the performance as follows: 

1. Simulate Usim = 250 samples of size n, from a d-dimensional target density 

/• 

2. On each sample, estimate the density with estimators /vine and /mvkde- 

3. For each estimator / G {/vine,/mvkde} and sample, calculate the integrated 
absolute error (lAE) as a performance measure: 

IAE(/):=/' \T(x) - f{x)\dx. 

The integral is estimated by importance sampling Monte Carlo (e.g.. Section 
5.2 in [40]), where we take the true density / as the sampling distribution. 
The number of Monte Carlo samples was set to 1 000. This gives an unbiased, 
low-variance estimate of the lAE. 

In the following section we will present the median lAE attained over 250 sim¬ 
ulations. Additionally, we use Mood’s median test [18] to check whether the 
difference in performance is statistically signihcant at the 1% level. Signihcant 
results will be indicated by stars above sample size axes of Figure 2. 

6.3. Results 

In the following, we illustrate the main insights of our numerical experiments in 
three examples — one where the simplifying assumption holds, and two where it 
does not. Since the simplifying assumption is a property of the copula, we focus on 
this part and set the marginal densities to standard Gaussian in all scenarios. For 
these margins, the two estimators /vine and /mvkde are asymptotically equivalent 
when d = 2. But they become different as soon as the simplifying assumption 
becomes relevant, i.e., when d > 2. Hence, differences in the performance of the 
two estimators can be directly related to the fact that /vine assumes a simplihed 
model. Additional simulation results for common parametric copula families 
(both simplihed and non-simplihed) and varying strength of dependence are 
provided in the online supplement. 

Scenario 1: Gaussian Copula 

The hrst scenario concerns the estimation of a d-dimensional Gaussian density. 
For simplicity, we choose the parameters such that all pair-wise Kendall’s r equal 
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d = 3 


d = 5 


d = 10 





d = 3 


(a) Gaussian copula 
d = 5 


d = 10 



d = 3 


(b) Gumbel copula 
d = 5 


d = 10 




(c) Non-simplifed Gaussian vine 


Figure 2: Median integrated absolute error achieved for varying sample size n 
and dimension d. The estimator /vine is indicated by circles; /mvkde by 
triangles. A star above the sample size means that the corresponding 
medians were found signihcantly different at the 1% level by Mood’s 
median test. 
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0.4 (this corresponds to an association parameter of p ~ 0.6). Recall that the 
simplifying assnmption is a property of the dependence, i.e. the copnla. The 
copula underlying a multivariate Gaussian density is the Gaussian copula which 
belongs to the class of simplified vine distributions [47]. Gonsequently, the vine 
copula based estimator is consistent in this situation. 

Figure 2a shows the median lAE of /vine (circles) and /mvkde (triangles) for 
varying sample size n and dimension d. The vine copula based estimator strictly 
outperforms the classical estimator by a considerable margin. The difference in 
lAEs is statistically significant for all dimensions and sample sizes. As predicted by 
Theorem 1 , we observe that — in contrast to the classical kernel density estimator 
— the vine copula based estimator converges at the same rate independent of 
dimension. Thus, the gap widens as dimension or sample size increase. For d = 5, 
/vine is almost two times as accurate; for d = 10 almost three times as accurate. 
These numbers are remarkable considering how slowly /mvkde can improve its 
accuracy when increasing sample size. The same conclusions can be drawn from 
the additional simulation results for simplihed models provided in the online 
supplement. 

Scenario 2: Gumbel copula 

Our second scenario, a Gumbel copula coupled with standard normal margins, 
violates the simplifying assumption; see Theorem 3.1 in [47]. Again, we choose 
the parameter of the Gumbel copula such that all pair-wise Kendall’s r equal 0.4 
(this corresponds to a Gumbel copula parameter 6 ^ 1.67). In this case, /mvkde is 
guaranteed to outperform /vine as n —)■ oo, because the latter is not consistent. 
On finite samples, however, the picture seems to be different. 

The performance of the two estimators in this scenario is displayed in Figure 2b. 
For d = 3, /vine is slightly worse than its competitor, but the difference is only 
signihcant for large sample sizes. For increasing dimension, the gap widens in 
favor of /vine which performs significantly better for d = 5 and d = 10. For d = 10 
and n = 5 000, the vine copula based estimator is almost two times as accurate — 
although it is not consistent. Since /mvkde converges so slowly, an extremely large 
number of observations would be required until it becomes the better choice. But 
for commonly available sample sizes and d > 3, the vine copula based estimator 
is preferable. The same hndings hold for the additional simulation results for 
non-simplified models provided in the online supplement. 

Scenario 3: Non-simplified Gaussian vine 

Lastly, we want to investigate how the vine copula based estimator behaves in a 
sort of ‘worst case scenario’. We set up a non-simplified vine copula with Gaussian 
pair-copulas and formulate their parameters as a regression on the conditioning 
variables implied by the vine. For each conditional pair-copula, the correlation 
parameter function p^'. [0, l]Tel i] describes a linear hyperplane ranging 
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from —1 to 1: 


2 _^ 

Pe{uD,) = 1 - 7^ ^ Uj, for e e Em, m>2. 

' j&D, 

Since / peiuojduj:)^ = 0 for all e G E 2 ,..., Ed-i, we also set pe = 0 for e G -Ei. 
This model is severely violating the simplifying assnmption for each conditional 
pair in the vine. 

The resnlts for this scenario are shown in Figure 2c. The vine copula based 
estimator performs signihcantly worse for d = 3,5. Remarkably, /vine manages 
to signihcantly outperform the classical estimator for d = 10. The severely 
non-simplihed dependence structure appears to be too difficult to identify even 
for a nonparametric estimator that does not rely on the simplifying assumption. 
Extrapolating the curves, we can expect that to hold for sample sizes much larger 
than those considered in our study. Also, we can expect the advantage of /vine 
to become even bigger in higher dimensions. We can conclude that even in this 
extremely unfavorable example, the estimator /vine proves useful when more than 
a few variables are involved. 


7. Application 

We revisit a classihcation problem from astrophysics which has previously been 
investigated in [7]. In their study, the authors consider synthetic data imitating 
measurements taken on images from the MAGIC (Major Atmospheric Gamma- 
ray Imaging Cherenkov) Telescopes located on the Canary islands. The goal is 
to identify primary gamma rays (the signal) amongst a large amount of hadron 
showers (background noise). The authors of the study evaluate the performance of 
several classification methods and judge the kernel density based Bayes classifier 
as one of the most convincing. We aim to augment their results and investigate 
how the vine copula based kernel density estimator performs on this problem. 

The data set is available from the UCI Machine Learning Repository web page 
(url: https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope) 
and consists of n = 19 020 observations on d = 10 variables, no = 12 332 of 
the observations are classified as gamma (signal) and uh = 6 688 as hadron 
(background). For more information on the astrophysical background and a more 
thorough description of the data we refer the reader to [7] and the UCI web page. 

Bayes classihers follow the idea of maximizing the posterior probability of a 
class given the data. Let G (for gamma) and H (for hadron) be the two classes 
and fc and fn be two estimates fitted separately in each class. Assume further 
we have knowledge of the class prior probabilities ttctth- With a straightforward 
application of Bayes’ theorem, we can estimate the posterior probability that the 
class is G as 


Pr (Class = G\X = x) 


(17) 
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FPR 

Figure 3: ROC curves for Bayes classifiers based on the vine copula based esti¬ 
mator (solid line) and classical multivariate kernel density estimator 
(dashed line). 


where a; is a realization of the random vector X. In the most general case, 
we classify an observation as G whenever the estimated posterior probability is 
greater than a = 0.5. However, by changing the threshold a we can furthermore 
control how many observations get classihed as G, and thereby influence key 
quantities such as the false positive rate (FPR) or true positive rate (TPR). The 
FPR is defined as the ratio of the number of false positives (here: hadron events 
that were misclassified as gamma) and the number of all negative (hadron) events. 
The TPR is dehned as the ratio of the nnmber of correctly classihed positive 
(gamma) events and the number of all positive events. In general, it is desirable 
to have a low FPR and a high TPR. But usually, there is a tradeoff between the 
two quantities: If we increase the threshold level a, a higher posterior probability 
is required for an observation to get classihed as gamma event. As a result, less 
observations will be classihed as gamma event, which in turn reduces both FPR 
and TPR. 

We repeat the experiment of [7] with the vine copnla based and classical 
kernel estimators. The implementations are similar to our simulation study (see 
Section 6.1). As is common in applications, we induce sparsity of the estimated 
model by adding an independence test to the structure selection algorithm; see 
Section 4 in [14]. We also found it necessary to multiply the marginal bandwidth 
parameters of /vine by 2 to stabilize the classihcation boundary in low-density 
regions. The experiment’s setup is the following: First, the densities for each class 
are estimated on the hrst 2/3 of the data which is used as training set. These 
estimates are used in combination with (17) to obtain class predictions for the 
remaining 1/3. For simplicity, the prior probabilities are set to ttg = t^h = 0.5. 
The predictions are then compared to the actual class of the observations which 
allows to asses the quality of the predictions. 

Figure 3 shows the receiver operating characteristic (ROC) curve which displays 
the TPR as a function of the FPR. It was noted in [7] that in this application 
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FPR 

0.01 

0.02 

0.05 

0.1 

0.2 

vine 

mvkde 

0.335 

0.335 

0.428 

0.408 

0.652 

0.567 

0.780 

0.730 

0.918 

0.868 


Table 1: True positive rates for the two estimators (second and third row) for 
given target levels of the false positive rate (first row). 


the focus is on low FPR level; in particular the 0.01, 0.02, 0.05, 0.1 and 0.2 levels. 
The TPR values of the ROC curves at these levels are additionally displayed in 
Table 1. The ROC curve of the vine copula based estimator lies above the curve of 
the classical multivariate kernel density estimator almost everywhere. This means 
that for a target FPR level, the vine copula based classiher is able to identify 
more observations correctly as signal events than the classical multivariate kernel 
density estimator. The results confirm what we could expect from our simulation 
study where, for d = 10 and several thousand observations, the vine copula based 
approach delivered much more accurate estimates. 

But also in comparison with other classification algorithms, the classiher based 
on /vine performs extraordinary well. A total of 14 algorithms were surveyed in 
[7], including variants of classihcation trees and neural networks, as well as the 
popular nearest-neighbor method and support vector machine. Two of the main 
performance measures used in their study are the average of the TPR at the 0.01, 
0.02 and 0.05 FPR levels (termed loacc), and the average of the TPR at the 0.1 
and 0.2 FPR levels (termed highacc). From Table 1 we calculate loacc = 0.472 
and highacc = 0.849. None of the 14 algorithms was able to produce a better 
loacc value than our approach. And only one method, random forests, delivered 
a slightly higher highacc of 0.852. This is particularly remarkable when we 
consider that the parameterization of our estimator was not tuned with respect 
to classihcation accuracy (unlike other classihcation algorithms). It might well 
be that the performance can be further improved by bandwidth and structure 
selection strategies that aim for classihcation rather than estimation accuracy. 

8. Further discussion 

In this paper, we discuss a vine copula approach to nonparametric density estima¬ 
tion. By assuming that the target density belongs to the class of simplihed vine 
densities, we can divide the estimation of a d-dimensional density into several one- 
and two-dimensional tasks. This allows us to achieve faster convergence rates 
than classical nonparameteric estimators when d > 3. In particular, the speed 
of convergence is independent of dimension. The advantages of this approach 
become more and more striking as dimension increases. It shows that a simplihed 
vine model for the dependence between variables is an appealing structure for 
nonparametric problems. For example, we can expect that similar results can be 
obtained for copula-based regression models [37, 33]. 

The crunchpoint in our approach is the simplifying assumption. If the sim- 
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plifying assumption is not satisfied, the proposed estimator is not consistent — 
but can nevertheless outperform its competitor in most practicable situations. 
However, the latter hnding may not be true if the simplifying assumption is 
violated in an extreme fashion and dimension is small. We guess that this is a 
rather unlikely situation to encounter in real data. However, appropriate tests for 
a formal empirical assessment have yet to be developed. From a theoretical point 
of view, this answer is highly unsatisfying and several urging questions arise: 

• How dense does the set of simplihed densities lie in the set of all densities? 
Put differently: how far off can we be by assuming a simplihed model? 

• How can we interpret the components of an estimated simplihed model 
when the assumption does not hold? 

Owing to the infancy of vine copula models, these questions remain open to this 
day. But several recent works have advanced the understanding of the simplifying 
assumption. A discussion of its appropriateness can be found in [26]. Copula 
classes where the simplifying assumption is satished are given in [47]. In [20], 
a general estimator of the copula was proposed for the case where a covariate 
affects only the marginal distributions (i.e., when the simplifying assumption 
does hold). Semiparametric estimation of three-dimensional non-simplihed PCCs 
was tackled in [3]; a test for the simplifying assumption was proposed in [2] 
under a semiparametric model. The empirical pair-copula, an extension of the 
empirical copula to simplihed vine copulas, was analyzed in [27]. The authors 
conjecture that this estimator converges at the parametric rate — even when 
pseudo-observations are used. The situation is diherent from ours since empirical 
copulas do not suher the curse of dimensionality. 

The notion of partial vine copula approximations (PVCA), i.e., the limit of a 
step-wise estimator under a simplihed model, was introduced in [46] . The authors 
show that the PVCA is not necessarily the best simplihed approximation to the 
true density. They further illustrate in an example that spurious dependence 
patterns can appear in trees Tm, m > 3, when the simplifying assumptions has 
falsely been assumed in previous trees Tm', 2 < m' < m. This property may not 
matter much in terms of estimation accuracy, but can corrupt the interpretability 
of an estimated PVCA. The estimator proposed in this paper is in fact an 
estimator of the PVCA. Our results suggest that the PVCA is a useful inferential 
object in any case: 

• Any d-dimensional PVCA can be consistently estimated at a rate that is 
equivalent to a two-dimensional problem. 

• If the simplifying assumption does hold, the PVCA coincides with the true 
density. 

• If the simplifying assumption does not hold, inference of the PVCA is still 
less difficult than inference of the actual density. This led to the following 
observation: On hnite samples, a consistent estimate of the PVCA can be 
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much closer to the true density than a consistent estimate of density itself 
(see Scenario 2 in Section 6). 

A related perspective on the phenomenon is that the simplifying assumption 
allows us to achieve more accurate estimates by model shrinkage. We incorporate 
the additional ‘information’ that the simplifying assumption is at least approxi¬ 
mately true. This allows us to reduce the set of possible solutions and thereby 
make the estimation problem ‘less difficult’. The most well known example of 
a shrinkage estimator is the sample variance. When dividing by n instead of 
n — 1 we give up unbiasedness of the estimator in order to achieve a smaller error. 
The same holds true for the vine copula based density estimator: if we make the 
simplifying assumption although it is not satished, we introdnce additional bias. 
In fact, we even give up consistency of the estimator in order to achieve better 
hnite-sample accuracy. 

The main advantage of the vine copula based approach is striking; Classical 
multivariate nonparametric density estimators converge very slowly to the true 
density when more than a few variables enter the model. Hence, one was unable 
to beneht from the increasing number of observations in modern data. A vine 
copula based estimator, on the other hand, converges at a high speed, no matter 
how many variables are involved. This makes it particularly appealing in the age 
of big data. 
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A. Proof of Theorem 1 

The proof consists of three steps. In the hrst step, we show by induction that 
all pseudo-observations converge sufficiently fast to the true observations. In the 
second step, we establish pointwise consistency of the feasible pair-copula density 
estimators Cj^^ke\De and conditional distribution function estimators and 

T/cepe- In the last step, we combine these results to establish the consistency of 


/vi 


vine* 


Step 1: Convergence of pseudo-observations 

We will show by induction that for all e G i?i,..., Ed-i, i = 1,... ,n, 



= Oa.,.(n ’') 


(18) 
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Let e & El (the conditioning set De is empty). Because of Alb we have, 

= sup |F(x,.)-F(x,.)|=o„.(n-'), 

^je ^^^ 7 . 


and the same argument applies to the second equality of (18). Now consider 
e G Ejn, 1 < m < d — 2, and assume that (18) holds for all e G E^- Recall that 
all pseudo-observations for e' G E^+i can be written as or 

for some e G Em- By the dehnition of the pseudo-observations and the triangle 
inequality, 


IL 


(b 

je I DeOke 


u. 


(i) 


je I DeOhe 


(b 

ke\De 


< 


\huk,-.D.{y:\D.\U: 

\hM{ufy}u, 

Hl,n + H2^n + 


(b 

ke.\Di 

{i) 

ke\D^ 
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\U, 


} - hUk,:D.{y:iD.\U: 

} - hj^\k^.D^{U. 

} “ hje\k,-,D, 

} - hj^\k,-D, 


(b 

je\Df 

ii) 

je\Dt 

(LA 

{vfk 


(b 

ke\D, 

|A(b 
|A(b 
(b 




}l 

}l 

}l 


Note that, almost surely, each realization of contained in [(5j, 1 — 

5if for Si ;= min{l7j;}^^, 1 “ 1 - > 0. And by mvoking 

(18) we see that for sufficiently large n, also each realization of 
contained in [Si/2,1 — 5^/2]^. Together with A2b and A3b this yields for large n, 


Hl,n 

< 

sup 1 

hy\k,-,DM'v) - %|fc,;i?,(M|n) 

Oa.s. (^e,n); 



(n,t^)e[(5i/2,l-(5i/2]2 



H2,n 

< 

sup 1 

hje|fce;r?e(w|'C) “ |fee;i3e (« 1 

= Oa.s.{n~''), 



(n,t^)e[(5i/2,l-(5i/2]2 




and invoking (18), 


^e,n 


= sup I U, 
2=1,...,n 
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je\^e 


u. 
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je\De 
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ke\De 
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(b 

ke.\De 


= Oa.s.{n '■), 


which gives Efi^n = Oa.s.{'^ '^)- It remains to show that Ef^n = Oa.s.i'^ ^)- Let 
denote the gradient of hj^\ke-,De- A hrst-order Taylor approximation of 


hj^lk.-MU^DpSloJ ^^ound yields 


Kb 


(b 


/■(b 


H3,n < 



-u 
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ke\De 


+ Oa.s. 
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ke\De 


U, 
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fcel^e 


Invoking (18), we get H^n = Oa.s.{n~'^)- This establishes the first equality of (18) 
for all e G E^+i- The second equality follows by symmetric arguments and the 
induction is complete. 
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Step 2: Consistency of conditional cdf and pair-copula density estimators 


With arguments almost identical to those in Step 1, we can furthermore show 
that for all e G -El,and all x G ffx, 


(19) 


( 20 ) 


h.lD.pj.ko,) - Fj^io.(x-j,\xD.) = Op(n 

Ft,ID,(xtJxo,') - -FfciD.ptJiED.) = Op(n-'). 

Next, we establish that for all e G -Ei,..., Ed-i, and all (u, v) G (0,1)^, 

The triangle inequality gives 

|cje,fce;-De {'^1 '^) ~ (w, u) | 

— |'"Je,fce;£>e ('^5 '^) ~ Cjg,fce;De ('^5 '^) | “I" |Cjg,fce;-De {'^1 '^) ~ Cj^^ke-,De {u, | 

Rn,l T Rn, 2 - 

We have .Rn,i = Oa.s.{n~^) by A3a and (18), whereas = Op{n~^) by A2a. 

Step 3: Consistency of the vine copula based density estimator 

The consistency of /vine now follows from (20) and Ala (second equality) together 
with (19) and the fact that Cj^^ke;De is continuously differentiable (third equality): 

d—1 d 

/vine(^) ~ n n X HfjiXj) 

k=l eS-Bfe j=l 

d-1 p 

= n n h,\DAxkJxD,)} +Op(n"'') 

k=l eG-Efc - 
d 

X Y[{fj{xj) + Op{r-^)} 


Cj,,k,-,D,{Fj^\DAxjJxD,), Fk^lDS^kJxD,)} +Op{n '■) + Op(n ^) 


i=i 

d-1 

=nn 

k=l eG-Efc 
d 

X W{fj{xj) + Op{n-'')] 

i=i 

= f{.x) + Op{n-''). 


□ 
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B. Lemmas 

B.l. Notation 

To ease notation in the following proofs, we write (u, v) = {wi, W 2 ) =($( 2 : 1 ), $( 2 : 2 )), 

■= C'Sd.. zf := 1>-' (I'fU). 4' := ■ (21) 

In this notation, the (oracle) transformation pair-copula density estimator is 

, -;<6, 1 u 

c{u,v) = c[^[Zi),^[Z2)\ = - ^ - 


n 


2=1 


(^{Zl)(^{z2) 


The corresponding (oracle) h-function estimator h is obtained by integration of c: 
Hu\v) ^ = i x: ~ ~ (22) 


2 = 1 


0 ( 2 ( 2 ) 


where Jfe„(-) = Kb^{s)ds. The feasible estimators c and h are obtained by 

replacing and with pseudo-observations and := $“^(140*^). 
Finally, we write 

an= sup -w["^+ sup 






B.2. Results 


Lemma Bl. Under conditions Kl, K2, Cl, and C2 it holds for all {u,v) G 

(0,l)^ 


c(m, v) = c{u, v) + Oa.s{an). 


Proof. By a first-order Taylor approximation of <I> j = 1, 2, 

zf - zf = (j?f - wf)l<l,(zf) + 0 .,^,xwf - ivf) 

= l/4>(zf) X 0....(o„), ^ 

where the Oa.s.(an) term does not depend on the index i since the supremum was 
taken. Denote Vz = {d/dzi,d/dz 2 y' ■ A hrst-order Taylor approximation of K 
yields 


0(d)0(2(2)|c{$(Zi), <h(z2)} - C{<l>(^l), <h(^2)}| 

n 1 ^ 

= X T. p - U')Kk (z, - zf) - - E AV (J1 - zf) As, (z, 

2=1 2=1 
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Y,^,{K,Szi-Z?)K,Sz2-Z^^)] 

i=l 
n 

5] V.{A'i.(zi - Zf')Kt,(z2 - Z,f )} I 

i=l \ 


_ ^d) 


d) yd) I I I y(i) y(i) 


y[‘J y 

f iMzh)' 
iMz^’) 




'vV) y\ 
^2 ^2 


^ Oa.s. (^n): 


where the last inequality is due to (23). Since is zero outside of [—6^,6^], we 
can bound this further by 


r]n{z) X 


- h‘’)Ki,„U2 - z«)| 

i=i ^ 


X Oa.sXttn), (24) 


where r]n{z) := 1/0(2/) = 0(1) for all 2 G R^. The 

second term is the absolute value of the gradient of a classical kernel density 
estimator. Since the derivatives of -0 are continuous and bounded by C2, it holds, 


^ 41 E - z!'’)/ 4.(22 - zf )| 

^ i=l ' 


= \yzi>{Zi, Z2)\ + Oa.s.(l), 


see Theorem 9 in [23]. Plugging this into (24) proves our claim. 


□ 


Lemma B2. Under conditions Kl, K2, Cl, and C2 it holds for all {u,v) G 
(0,1)2, 5 G (0,0.5], 


sup \h{u\v) - h{u\v)\= Oa.s.+ yinn/(n 6 n))- 
5]^ 

Proof. Equations 40 and 41 in [22] yield 

hl{h{u\v)} - h{u\v)= bl/3{u,v) + o{bl), 

for some bias term /3{u, v) involving h and 0 as well as their first- and second order 
derivatives. Since all parts are continuous on [5,1 — (5]2 by Cl for all 6 G (0, 0.5], 
it holds 

sup |E{h(M|n)} - h{u\v)\ = Oa.s.{bl). 

{u,v)g[S,1—S]^ 

On the other hand. Lemma 2.2 of [24] ensures that 

sup \h{u\v) - E{h(M|r;)}| = Oa.s.{^/l^n/{nbn)) ■ 

{u,v)g[S,1—S]‘^ 


Combining the previous two equations concludes the proof. 


□ 
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Lemma B3. Under conditions Kl, K2, Cl, and C2 it holds for all (u, v) G 
(0,1)2, 5 G (0,0.5], 

sup \h{u\v) — h{u\v)\= Oa.sXo-n) ■ 


Proof. With arguments similar to the proof of Lemma Bl, we can show 


sup \h{u\v) — h{u\v)\ 


sup 

< sup 


hmz,)Mz2)} - hmz,)Mz2)}\ 

Vn{z) 


0 (^ 2 ) 


xV.h{^Z,)mZ2)} 


X O a.s.{,Ojn) •> 


where //^(z) = 1/0(2/) and the Oa.s term is inde¬ 

pendent of z. The supremum on the right hand side is 0(1) because all functions 
are continuous in z on every compact subset of R^. As a result, the right can be 
bounded by a constant times the Oa.s.(a„) term. This establishes our claim. □ 


C. Proof of Proposition 5 

From Proposition 2 and condition (v) in Proposition 5 we get for all f = 
1,... ,d, and a; G R, that fi{x) = fiix) + Op{&^c + . This implies 

{nbl^ {fi{x) — fiix)'^ = Op{l) and we have established that the first d compo¬ 

nents of (16) converge to zero in probability. Hence, the hrst d components of 
p,x as well as the hrst d rows and columns of Ej, will be zero and we only have 
to deal with the remaining components in (16). 

From (20) and (19) in the proof of Theorem 1 and Proposition 3 we further¬ 
more know that = Cj,,fce;De(«w) + Op{bl^^ {nblj~^/'^} as well as 

Fj^nYxjJxDj = + Op{&n,c + inbXc)~^/^}- Similar to Lemma B3, 

we can now show that 

^je,ke]De "{ ^je\De i.^je I ) 5 ^ke\De I } 

= Cj^,k,.,D,{Fy\DAxjXxDj, + inbl^c)~^Fy 

Hence, for (16) to hold it suffices to show that 

inbl^cYFl^c*{x)-bY^fix-c*{x)} 4Ar(0,E^), (25) 


where 

and c*{x) is dehned similarly, but replacing with Cj^^ke\De- 
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Define Z<;>„.:= 1>-‘(C{!:U). Z 

Zk,\D,-=^ ^{Fk,\DA^K\XD,)]- 
entries 


(i) 

ke\De 

Let 


:= #-‘((7, 


(■) 

ke\De 


^jADe 


'^n,i ■ (Yn^i^e}e£Ei,...,E^_i 


be a vector with 




n,2,e 


(nblJ 





Then, S"., V.,. = c*{x). By the multivariate Lindeberg-Feller central 

limit theorem (Proposition 2.27 in [50]), (25) holds when 


= {nblj^/^{c*{x) + bl^^jl^ + o{blJ}, (26) 

i=l 

n 

^cov(17,,i)Sa,, (27) 

i=l 

n 

>£)} ^0, forall£>0. (28) 

i=l 


Since 17i,i are independent for i = 1,..., n, it holds 


^E(y„,i)=,!E(y„,,). ^ cov(F;,j) = ncov(Y;,i). 

i=l i=l 

Denote further Uj^\D, := Fj^\DXxj,\xD,), Uk,\D, ■= Fk,\DXxK\xD,). Corollary 3.4 
in [35] gives 


^E(l7l,j,e) — irb^^^ ^ {Ce!fce;-De('*^ie|n>e) '^fcel^e) 4" ^n,ch'a:,e + o(&,,j^g)}, (29) 


where 

hfc,e • 




5^Cj^,fce;n>e(“te|n'ef WfcelDe) ,2/ ^ ,{u j,\D,, UkAD,) ,2 / \ 

--P (%e|i3e) + -T^y2-P CkADj 

iel^e 

_ ^dCje,ke-,De {Uj,\D,,Uk,\D,) 

dUj^\D, 

3dCj^,ke-,D, {ujAD,,UkADe) 


du 


ke\De 


4>{Zje\DjZj^\D, 

(l>{ZkAD^)ZkADe 


(30) 


+ Cj^,k,-,D,{uj^\D,,UkADe) X (4|Oe + ^kAD, “ 2) }^’ 


and aj^ := J^_-^-^^x^K{x)dx. This validates (26). By the change of variable 
Si = {zi - Zj^\Dj/bn,c, S 2 = (^2 “ Zk,\De)/bn,c, and a Taylor approximation of 
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iJje,ke-,De (as defined in C2), we get 




= nE 


1 ^2 / ~ \ j^2 / ^fce|De “ 


Tih^ 





IR, «/ IR 


K‘^{si)K‘^{s2)'ilJy,k,-,DA^y\D, - bn,cSl, Zk,\D, - bn,cS2)dSidS2 


= Zk,\Dj + o(l), 

where '■= J-^K'^{s)ds. Using (31) and (29), we obtain 


nvar(y;,i,e) 


Cj^,ke]D^ {uj^\Ds,Uk^\De) 
(t>{Zj^\D,)(t>{Zk,\D,) 


=: a, 


aj,e* 


(31) 


(32) 


Arguments similar to (31) show that for any two edges e 7 ^ e', it holds nE{Yn,i,eYn,i,e' 
Oipn^c)') and with (29), ncov{Yn^i^f,iYn,i,e') 0. We have shown that (27) holds 
with Ex being a diagonal matrix with diagonal entries dx^e given in (32). 

Instead of checking the remaining condition (28) directly, we will verify the 
stronger Lyapunov-type condition E(|| Wi,i||^) 0. By Jensen’s inequality 

we get 






d-l 

< nv'rf(rf-l)/2^ Y. bUh. 

m=l eSi?m 


where d{d — l)/2 is the number of terms in the double sum. Hence, it suffices 
to show nE(y)f^g) —)■ 0 for any e G Ei,..., Ea-i- Similar to (31), we get 
= 0{l/{nbiy/^} which is o(l). □ 
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